[116]简报:大数据Hadoop动态 - 2016Q2

Apache Storm

Storm发布1.0.0版本,关键特性:

HDP 2.4.2版本中APACHE SPARK & APACHE ZEPPELIN的增强

  • Certified SparkSQL with ODBC (ODBC driver available from Hortonworks).
  • Bug fixes in Spark Oozie action for a Kerberos enabled cluster.
  • Spark Streaming with Apache Kafka support in a Kerberos enabled cluster.
  • SparkSQL & ORC performance improvements.
  • Final technical preview of Apache Zeppelin that includes Kerberos support, LDAP Authentication, and identity propagation.
    http://hortonworks.com/blog/apache-spark-apache-zeppelin-whats-coming-in-hdp-2-4-2/

Cloudera Engineering

How-to: Detect and Report Web-Traffic Anomalies in Near Real-Time
http://blog.cloudera.com/blog/2016/06/how-to-detect-and-report-web-traffic-anomalies-in-near-real-time/
Best Practices for Enterprise Data Hub Encryption
http://blog.cloudera.com/blog/2016/06/best-practices-for-enterprise-data-hub-encryption/
How-to: Analyze Fantasy Sports with Apache Spark and SQL (Part 2: Data Exploration)
http://blog.cloudera.com/blog/2016/06/how-to-analyze-fantasy-sports-with-apache-spark-and-sql-part-2-data-exploration/
How-to: Analyze Fantasy Sports using Apache Spark and SQL
http://blog.cloudera.com/blog/2016/06/how-to-analyze-fantasy-sports-using-apache-spark-and-sql/
https://spark-summit.org/2016/schedule/
Guide to Configuring Apache Impala (incubating) for HA with F5 BIG-IP
http://blog.cloudera.com/blog/2016/05/guide-to-configuring-apache-impala-incubating-for-ha-with-f5-big-ip/
http://www.cloudera.com/documentation/other/reference-architecture/PDF/Impala-HA-with-F5-BIG-IP.pdf
Multi-node Clusters with Cloudera QuickStart for Docker
http://blog.cloudera.com/blog/2016/08/multi-node-clusters-with-cloudera-quickstart-for-docker/
Livy, the Open Source REST Service for Apache Spark, Joins Cloudera Labs
http://blog.cloudera.com/blog/2016/07/livy-the-open-source-rest-service-for-apache-spark-joins-cloudera-labs/
Untangling Apache Hadoop YARN, Part 4: Fair Scheduler Queue Basics
http://blog.cloudera.com/blog/2016/06/untangling-apache-hadoop-yarn-part-4-fair-scheduler-queue-basics/
New Study: Evaluating Apache HBase Performance on Modern Storage Media
http://blog.cloudera.com/blog/2016/06/new-study-evaluating-apache-hbase-performance-on-modern-storage-media/
https://software.intel.com/sites/default/files/managed/95/0d/Optimize%20Hadoop%20Cluster%20Performance%20with%20Various%20Storage%20Media%20334463-001US.pdf
How-to: Process and Index Medical Images with Apache Hadoop and Apache Solr
http://blog.cloudera.com/blog/2016/05/how-to-process-and-index-medical-images-with-apache-hadoop-and-apache-solr/
How-to: Configure SAP HANA with Apache Impala (incubating)
http://blog.cloudera.com/blog/2016/05/how-to-configure-sap-hana-with-apache-impala-incubating/
How-to: Build a Prediction Engine using Spark, Kudu, and Impala
http://blog.cloudera.com/blog/2016/05/how-to-build-a-prediction-engine-using-spark-kudu-and-impala/
How-to: Improve Apache HBase Performance via Data Serialization with Apache Avro
http://blog.cloudera.com/blog/2016/05/how-to-improve-apache-hbase-performance-via-data-serialization-with-apache-avro/
Inside Santander’s Near Real-Time Data Ingest Architecture (Part 2)
http://blog.cloudera.com/blog/2016/05/inside-santanders-near-real-time-data-ingest-architecture-part-2/
Inside Santander’s Near Real-Time Data Ingest Architecture
http://blog.cloudera.com/blog/2015/08/inside-santanders-near-real-time-data-ingest-architecture/
Apache Impala (incubating) in CDH 5.7: 4x Faster for BI Workloads on Apache Hadoop
http://blog.cloudera.com/blog/2016/04/apache-impala-incubating-in-cdh-5-7-4x-faster-for-bi-workloads-on-apache-hadoop/
New in Cloudera Manager 5.7: Cluster Utilization Reporting
http://blog.cloudera.com/blog/2016/04/new-in-cloudera-manager-5-7-cluster-utilization-reporting/
Cloudera Enterprise 5.7 is Released
http://blog.cloudera.com/blog/2016/04/cloudera-enterprise-5-7-is-released/
How-to: Use Impala and Kudu Together for Analytic Workloads
http://blog.cloudera.com/blog/2016/04/how-to-use-impala-and-kudu-together-for-analytic-workloads/
Quality Assurance at Cloudera: Running/Upgrading to New Releases on Our Own EDH Cluster
http://blog.cloudera.com/blog/2016/04/quality-assurance-at-cloudera-runningupgrading-to-new-releases-on-our-own-edh-cluster/
Quality Assurance at Cloudera: Fault Injection and Elastic Partitioning
http://blog.cloudera.com/blog/2016/04/quality-assurance-at-cloudera-fault-injection-and-elastic-partitioning/
Benchmarking Apache Parquet: The Allstate Experience
http://blog.cloudera.com/blog/2016/04/benchmarking-apache-parquet-the-allstate-experience/

Cloudera Vision

How GoPro uses Apache Hadoop in the Cloud
https://vision.cloudera.com/gopro-hadoop-cloud/
SQL-on-Apache Hadoop – Choosing the right tool for the right job
https://vision.cloudera.com/sql-on-apache-hadoop-choosing-the-right-tool-for-the-right-job/
New Open-Source Service Enables Apache Spark Development
https://vision.cloudera.com/new-open-source-service-enables-apache-spark-development/
Tuning Hive on Spark
http://www.cloudera.com/documentation/enterprise/latest/topics/admin_hos_tuning.html
http://www.cloudera.com/documentation/enterprise/latest/topics/admin_performance.html
Faster Batch Processing with Hive-on-Spark
https://vision.cloudera.com/faster-batch-processing-with-hive-on-spark/
Beyond ETL: Real-time, Streaming Architectures
https://vision.cloudera.com/beyond-etl-real-time-streaming-architectures/
The One Platform Initiative Delivers
https://vision.cloudera.com/the-one-platform-initiative-delivers/

Hortonworks

Rack Awareness
https://community.hortonworks.com/articles/43057/rack-awareness-1.html
Rack Awareness Series 2
https://community.hortonworks.com/articles/43164/rack-awareness-series-2.html
Disaster recovery and Backup best practices in a typical Hadoop Cluster :Series 1 Introduction
https://community.hortonworks.com/articles/43525/disaster-recovery-and-backup-best-practices-in-a-t.html
MICROBENCHMARKING APACHE STORM 1.0 PERFORMANCE
http://hortonworks.com/blog/microbenchmarking-storm-1-0-performance/
TOP 5 ARTICLES ON HADOOP
http://hortonworks.com/blog/top-5-articles-hadoop/
TOP ARTICLES AND QUESTIONS FROM HCC LAST WEEK
http://hortonworks.com/blog/top-articles-questions-hcc-last-week/
HIVE LLAP TECHNICAL PREVIEW ENABLES SUB-SECOND SQL ON HADOOP AND MORE
http://hortonworks.com/blog/llap-enables-sub-second-sql-hadoop/
APACHE METRON TECH PREVIEW 2 AVAILABLE NOW!
http://hortonworks.com/blog/apache-metron-technical-preview-2/
LATEST INNOVATION WITHIN HORTONWORKS DATA PLATFORM (HDP) 2.5 UNVEILED
http://hortonworks.com/blog/latest-innovation-within-hortonworks-data-platform-hdp-2-5-unveiled/
SPARK-ON-HBASE: DATAFRAME BASED HBASE CONNECTOR
http://hortonworks.com/blog/spark-hbase-dataframe-based-hbase-connector/
UNDER-THE-HOOD WITH AMBARI METRICS AND GRAFANA
http://hortonworks.com/blog/hood-ambari-metrics-grafana/
A BRIEF HISTORY OF APACHE STORM
http://hortonworks.com/blog/brief-history-apache-storm/
HORTONWORKS HDP AND SAS EVENT STREAM PROCESSING TOGETHER, USING YARN
https://hortonworks.com/blog/hortonworks-hdp-and-sas-event-stream-processing-together-using-yarn/
APACHE SPARK & APACHE ZEPPELIN: WHAT’S COMING IN HDP 2.4.2
http://hortonworks.com/blog/apache-spark-apache-zeppelin-whats-coming-in-hdp-2-4-2/
ANNOUNCING CLOUDBREAK 1.2
http://hortonworks.com/blog/announcing-cloudbreak-1-2/
ANNOUNCING APACHE STORM 1.0.0
http://hortonworks.com/blog/announcing-apache-storm-1-0-0/
THE NEXT GENERATION OF HADOOP-BASED SECURITY & DATA GOVERNANCE
http://hortonworks.com/blog/the-next-generation-of-hadoop-based-security-data-governance/
ADVANCED METRICS VISUALIZATION DASHBOARDING WITH APACHE AMBARI
http://hortonworks.com/blog/advanced-metrics-visualization-dashboarding-apache-ambari/
STREAMLINING APACHE HADOOP OPERATIONS
http://hortonworks.com/blog/streamlining-apache-hadoop-operations/
THE NEXT MARKET LEADERS WILL POWER THEIR BUSINESSES FROM IOAT DATA SOURCES
http://hortonworks.com/blog/next-market-leaders-will-power-businesses-ioat-data-sources/

Databricks

Preview of Apache Spark 2.0 now on Databricks Community Edition
https://databricks.com/blog/2016/05/11/spark-2-0-technical-preview-easier-faster-and-smarter.html
Spark Trending in the Stack Overflow Survey
https://databricks.com/blog/2016/03/22/spark-trending-in-the-stack-overflow-survey.html
http://stackoverflow.com/research/developer-survey-2016
Continuous Integration and Delivery of Spark Applications at Metacog
https://databricks.com/blog/2016/04/06/continuous-integration-and-delivery-of-spark-applications-at-metacog.html

MapR

IoT Spotlight: Sensor to Dashboard – Real-Time Stream Processing for Oil and Gas
https://www.mapr.com/blog/iot-spotlight-sensor-dashboard-real-time-stream-processing-oil-and-gas
Using MapR, Mesos, Marathon, Docker, and Apache Spark to Deploy and Run Your First Jobs and Containers
https://www.mapr.com/blog/using-mapr-mesos-marathon-docker-and-apache-spark-deploy-and-run-your-first-jobs-and-containers
Apache Apex on MapR Converged Platform
https://www.mapr.com/blog/apache-apex-mapr-converged-platform
Monitoring a MapR Cluster with Elasticsearch + Kibana
https://www.mapr.com/blog/monitoring-mapr-cluster-elasticsearch-kibana
Real Time Credit Card Fraud Detection with Apache Spark and Event Streaming
https://www.mapr.com/blog/real-time-credit-card-fraud-detection-apache-spark-and-event-streaming
Fast, Scalable, Streaming Applications with the Kafka API (MapR Streams), Spark Streaming, and the HBase API (MapR-DB)
https://www.mapr.com/blog/fast-scalable-streaming-applications-kafka-api-mapr-streams-spark-streaming-and-hbase-api-mapr